Goto

Collaborating Authors

 assessment tool


Socratic Mind: Impact of a Novel GenAI-Powered Assessment Tool on Student Learning and Higher-Order Thinking

Lee, Jeonghyun, Hung, Jui-Tse, Soylu, Meryem Yilmaz, Popescu, Diana, Cui, Christopher Zhang, Grigoryan, Gayane, Joyner, David A, Harmon, Stephen W

arXiv.org Artificial Intelligence

This study examines the impact of Socratic Mind, a Generative Artificial Intelligence (GenAI) powered formative assessment tool that employs Socratic questioning to support student learning in a large, fully online undergraduate-level computing course. Employing a quasi-experimental, mixed-methods design, we investigated participants' engagement patterns, the influence of user experience on engagement, and impacts on both perceived and actual learning outcomes. Data were collected from the system logs, surveys on user experience and perceived engagement and learning gains, student reflections, and course performance data. Results indicated that participants consistently reported high levels of affective, behavioral, and cognitive engagement, and these were strongly linked to positive user experiences and perceived learning outcomes. Quantitative analysis further revealed that students who engaged with the GenAI tool experienced significant gains in their quiz scores compared to those who did not, particularly benefiting students with lower baseline achievement. Additionally, thematic analysis of qualitative feedback revealed substantial perceived improvements in higher-order thinking skills, including problem solving, critical thinking, and self-reflection. Our findings highlight the promise of AI-mediated dialogue in fostering deeper engagement and higher-order cognitive skills. As higher education institutions expand GenAI integration in curriculum, this dialogic, GenAI powered assessment tool can offer a scalable strategy to promote students' meaningful learning outcomes.


TAI Scan Tool: A RAG-Based Tool With Minimalistic Input for Trustworthy AI Self-Assessment

Davvetas, Athanasios, Ziouvelou, Xenia, Dami, Ypatia, Kaponis, Alexios, Giouvanopoulou, Konstantina, Papademas, Michael

arXiv.org Artificial Intelligence

This paper introduces the TAI Scan Tool, a RAG-based TAI self-assessment tool with minimalistic input. The current version of the tool supports the legal TAI assessment, with a particular emphasis on facilitating compliance with the AI Act. It involves a two-step approach with a pre-screening and an assessment phase. The assessment output of the system includes insight regarding the risk-level of the AI system according to the AI Act, while at the same time retrieving relevant articles to aid with compliance and notify on their obligations. Our qualitative evaluation using use-case scenarios yields promising results, correctly predicting risk levels while retrieving relevant articles across three distinct semantic groups. Furthermore, interpretation of results shows that the tool's reasoning relies on comparison with the setting of high-risk systems, a behaviour attributed to their deployment requiring careful consideration, and therefore frequently presented within the AI Act.


Statistical Validation in Cultural Adaptations of Cognitive Tests: A Multi- Regional Systematic Review

Daga, Miit, Mohanty, Priyasha, Krishna, Ram, RM, Swarna Priya

arXiv.org Artificial Intelligence

This systematic review discusses the methodological approaches and statistical confirmations of cross-cultural adaptations of cognitive evaluation tools used with different populations. The review considers six seminal studies on the methodology of cultural adaptation in Europe, Asia, Africa, and South America. The results indicate that proper adaptations need holistic models with demographic changes, and education explained as much as 26.76% of the variance in MoCA-H scores. Cultural-linguistic factors explained 6.89% of the variance in European adaptations of MoCA-H; however, another study on adapted MMSE and BCSB among Brazilian Indigenous populations reported excellent diagnostic performance, with a sensitivity of 94.4% and specificity of 99.2%. There was 78.5% inter-rater agreement on the evaluation of cultural adaptation using the Manchester Translation Evaluation Checklist. A paramount message of the paper is that community feedback is necessary for culturally appropriate preparation, standardized translation protocols also must be included, along with robust statistical validation methodologies for developing cognitive assessment instruments. This review supplies evidence-based frameworks for the further adaptation of cognitive assessments in increasingly diverse global health settings.


Estimating Lexical Complexity from Document-Level Distributions

Wold, Sondre, Mæhlum, Petter, Hove, Oddbjørn

arXiv.org Artificial Intelligence

Existing methods for complexity estimation are typically developed for entire documents. This limitation in scope makes them inapplicable for shorter pieces of text, such as health assessment tools. These typically consist of lists of independent sentences, all of which are too short for existing methods to apply. The choice of wording in these assessment tools is crucial, as both the cognitive capacity and the linguistic competency of the intended patient groups could vary substantially. As a first step towards creating better tools for supporting health practitioners, we develop a two-step approach for estimating lexical complexity that does not rely on any pre-annotated data. We implement our approach for the Norwegian language and verify its effectiveness using statistical testing and a qualitative evaluation of samples from real assessment tools. We also investigate the relationship between our complexity measure and certain features typically associated with complexity in the literature, such as word length, frequency, and the number of syllables.


Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models

Pan, Keyu, Zeng, Yawen

arXiv.org Artificial Intelligence

The field of large language models (LLMs) has made significant progress, and their knowledge storage capacity is approaching that of human beings. Furthermore, advanced techniques, such as prompt learning and reinforcement learning, are being employed to address ethical concerns and hallucination problems associated with LLMs, bringing them closer to aligning with human values. This situation naturally raises the question of whether LLMs with human-like abilities possess a human-like personality? In this paper, we aim to investigate the feasibility of using the Myers-Briggs Type Indicator (MBTI), a widespread human personality assessment tool, as an evaluation metric for LLMs. Specifically, extensive experiments will be conducted to explore: 1) the personality types of different LLMs, 2) the possibility of changing the personality types by prompt engineering, and 3) How does the training dataset affect the model's personality. Although the MBTI is not a rigorous assessment, it can still reflect the similarity between LLMs and human personality. In practice, the MBTI has the potential to serve as a rough indicator. Our codes are available at https://github.com/HarderThenHarder/transformers_tasks/tree/main/LLM/llms_mbti.


A Study on the Performance of Generative Pre-trained Transformer (GPT) in Simulating Depressed Individuals on the Standardized Depressive Symptom Scale

Cai, Sijin, Zhang, Nanfeng, Zhu, Jiaying, Liu, Yanjie, Zhou, Yongjin

arXiv.org Artificial Intelligence

Background: Depression is a common mental disorder with societal and economic burden. Current diagnosis relies on self-reports and assessment scales, which have reliability issues. Objective approaches are needed for diagnosing depression. Objective: Evaluate the potential of GPT technology in diagnosing depression. Assess its ability to simulate individuals with depression and investigate the influence of depression scales. Methods: Three depression-related assessment tools (HAMD-17, SDS, GDS-15) were used. Two experiments simulated GPT responses to normal individuals and individuals with depression. Compare GPT's responses with expected results, assess its understanding of depressive symptoms, and performance differences under different conditions. Results: GPT's performance in depression assessment was evaluated. It aligned with scoring criteria for both individuals with depression and normal individuals. Some performance differences were observed based on depression severity. GPT performed better on scales with higher sensitivity. Conclusion: GPT accurately simulates individuals with depression and normal individuals during depression-related assessments. Deviations occur when simulating different degrees of depression, limiting understanding of mild and moderate cases. GPT performs better on scales with higher sensitivity, indicating potential for developing more effective depression scales. GPT has important potential in depression assessment, supporting clinicians and patients.


Designing trustworthy and transparent AI systems using assessment tools

#artificialintelligence

The hype around ChatGPT has brought the topic of artificial intelligence and its impressive potential to the fore. At the same time, ensuring the quality and maintaining control of AI systems are becoming increasingly important--especially when these systems take on responsible tasks. After all, the chat-bot's results are based on huge amounts of text data from the internet. That said, systems like ChatGPT only compute the most likely answer to a question and output it as a fact. Researchers from the Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS will be showcasing various assessment tools and processes that can be used to systematically examine AI systems for weaknesses throughout their life cycle and safeguard against AI risks at the Hannover Messe 2023 from April 17 to 21 (at the joint Fraunhofer booth A12 in Hall 16).


Ergonomically Intelligent Physical Human-Robot Interaction: Postural Estimation, Assessment, and Optimization

Yazdani, Amir, Novin, Roya Sabbagh, Merryweather, Andrew, Hermans, Tucker

arXiv.org Artificial Intelligence

Ergonomics and human comfort are essential concerns in physical human-robot interaction applications, and common practical methods either fail in estimating the correct posture due to occlusion or suffer from less accurate ergonomics models in their postural optimization methods. Instead, we propose a novel framework for posture estimation, assessment, and optimization for ergonomically intelligent physical human-robot interaction. We show that we can estimate human posture solely from the trajectory of the interacting robot. We propose DULA, a differentiable ergonomics model, and use it in gradient-free postural optimization for physical human-robot interaction tasks such as co-manipulation and teleoperation. We evaluate our framework through human and simulation experiments.


DULA: A Differentiable Ergonomics Model for Postural Optimization in Physical HRI

Yazdani, Amir, Novin, Roya Sabbagh, Merryweather, Andrew, Hermans, Tucker

arXiv.org Artificial Intelligence

Ergonomics and human comfort are essential concerns in physical human-robot interaction applications. Defining an accurate and easy-to-use ergonomic assessment model stands as an important step in providing feedback for postural correction to improve operator health and comfort. In order to enable efficient computation, previously proposed automated ergonomic assessment and correction tools make approximations or simplifications to gold-standard assessment tools used by ergonomists in practice. In order to retain assessment quality, while improving computational considerations, we introduce DULA, a differentiable and continuous ergonomics model learned to replicate the popular and scientifically validated RULA assessment. We show that DULA provides assessment comparable to RULA while providing computational benefits. We highlight DULA's strength in a demonstration of gradient-based postural optimization for a simulated teleoperation task.


Can Artificial Intelligence Give Us Equal Justice?

#artificialintelligence

It's "misleading and counterproductive" to block the use of machine-learning algorithms in the justice system on the grounds that some of them may be subject to racial bias, according to a forthcoming study in the American Criminal Law Review. The use of artificial intelligence by judges, prosecutors, police and other justice authorities remains "the best means to overcome the pervasive bias and discrimination that exists in all parts of the deeply flawed criminal justice system," said the study. Algorithmic systems are used in a variety of ways in the U.S. justice system in practices ranging from identifying and predicting crime "hot spots" to real-time surveillance. More than 60 kinds of risk assessment tools are currently in use by court systems around the country, usually to weigh whether individuals should be held in detention before trial or can be released on their own recognizance. The risk assessment tools, which assign weights to data points such as previous arrests and the age of the offender, have come under fire from activists, judges, prosecutors, and some criminologists who say they are susceptible to bias themselves.